decision forest
Decision Jungles: Compact and Rich Models for Classification
Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, Antonio Criminisi
Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision. However, they face a fundamental limitation: given enough data, the number of nodes in decision trees will grow exponentially with depth. For certain applications, for example on mobile or embedded processors, memory is a limited resource, and so the exponential growth of trees limits their depth, and thus their potential accuracy. This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification. Unlike conventional decision trees that only allow one path to every node, a DAG in a decision jungle allows multiple paths from the root to each leaf. We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently. During training, node splitting and node merging are driven by the minimization of exactly the same objective function, here the weighted sum of entropies at the leaves. Results on varied datasets show that, compared to decision forests and several other baselines, decision jungles require dramatically less memory while considerably improving generalization.
- North America > United States > Texas > Travis County > Austin (0.14)
- Oceania > Australia > Victoria (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
Deep Neural Decision Forest: A Novel Approach for Predicting Recovery or Decease of COVID-19 Patients with Clinical and RT-PCR
Dehghani, Mohammad, Yazdanparast, Zahra, Samani, Rasoul
COVID-19 continues to be considered an endemic disease in spite of the World Health Organization's declaration that the pandemic is over. This pandemic has disrupted people's lives in unprecedented ways and caused widespread morbidity and mortality. As a result, it is important for emergency physicians to identify patients with a higher mortality risk in order to prioritize hospital equipment, especially in areas with limited medical services. The collected data from patients is beneficial to predict the outcome of COVID-19 cases, although there is a question about which data makes the most accurate predictions. Therefore, this study aims to accomplish two main objectives. First, we want to examine whether deep learning algorithms can predict a patient's morality. Second, we investigated the impact of Clinical and RT-PCR on prediction to determine which one is more reliable. We defined four stages with different feature sets and used interpretable deep learning methods to build appropriate model. Based on results, the deep neural decision forest performed the best across all stages and proved its capability to predict the recovery and death of patients. Additionally, results indicate that Clinical alone (without the use of RT-PCR) is the most effective method of diagnosis, with an accuracy of 80%. It is important to document and understand experiences from the COVID-19 pandemic in order to aid future medical efforts. This study can provide guidance for medical professionals in the event of a crisis or outbreak similar to COVID-19. Keywords: Machine Learning, Deep Learning, Deep Neural Decision Forest, COVID-19, Polymerase Chain Reaction, RT-PCR. 1. Introduction COVID-19 was first observed as a deadly illness in the Wuhan region of China in 2019. It was highly contagious and spread rapidly through direct contact with an infected individual [1].
- Asia > China > Hubei Province > Wuhan (0.24)
- North America > United States (0.14)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- (10 more...)
- Research Report > New Finding (0.94)
- Research Report > Experimental Study (0.89)
Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library
Guillame-Bert, Mathieu, Bruch, Sebastian, Stotz, Richard, Pfeifer, Jan
Yggdrasil Decision Forests is a library for the training, serving and interpretation of decision forest models, targeted both at research and production work, implemented in C++, and available in C++, command line interface, Python (under the name TensorFlow Decision Forests), JavaScript, Go, and Google Sheets (under the name Simple ML for Sheets). The library has been developed organically since 2018 following a set of four design principles applicable to machine learning libraries and frameworks: simplicity of use, safety of use, modularity and high-level abstraction, and integration with other machine learning libraries. In this paper, we describe those principles in detail and present how they have been used to guide the design of the library. We then showcase the use of our library on a set of classical machine learning problems. Finally, we report a benchmark comparing our library to related solutions.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.05)
- (5 more...)
Very fast, approximate counterfactual explanations for decision forests
Carreira-Perpiñán, Miguel Á., Hada, Suryabhan Singh
We consider finding a counterfactual explanation for a classification or regression forest, such as a random forest. This requires solving an optimization problem to find the closest input instance to a given instance for which the forest outputs a desired value. Finding an exact solution has a cost that is exponential on the number of leaves in the forest. We propose a simple but very effective approach: we constrain the optimization to only those input space regions defined by the forest that are populated by actual data points. The problem reduces to a form of nearest-neighbor search using a certain distance on a certain dataset. This has two advantages: first, the solution can be found very quickly, scaling to large forests and high-dimensional data, and enabling interactive use. Second, the solution found is more likely to be realistic in that it is guided towards high-density areas of input space.
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (12 more...)
Simplification of Forest Classifiers and Regressors
Nakamura, Atsuyoshi, Sakurada, Kento
We study the problem of sharing as many branching conditions of a given forest classifier or regressor as possible while keeping classification performance. As a constraint for preventing from accuracy degradation, we first consider the one that the decision paths of all the given feature vectors must not change. For a branching condition that a value of a certain feature is at most a given threshold, the set of values satisfying such constraint can be represented as an interval. Thus, the problem is reduced to the problem of finding the minimum set intersecting all the constraint-satisfying intervals for each set of branching conditions on the same feature. We propose an algorithm for the original problem using an algorithm solving this problem efficiently. The constraint is relaxed later to promote further sharing of branching conditions by allowing decision path change of a certain ratio of the given feature vectors or allowing a certain number of non-intersected constraint-satisfying intervals. We also extended our algorithm for both the relaxations. The effectiveness of our method is demonstrated through comprehensive experiments using 21 datasets (13 classification and 8 regression datasets in UCI machine learning repository) and 4 classifiers/regressors (random forest, extremely randomized trees, AdaBoost and gradient boosting).
- Oceania > Australia > Tasmania (0.04)
- North America > United States > Wisconsin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (6 more...)
Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise
Gunasar, Tekin, Rekesh, Alexandra, Nair, Atul, King, Penelope, Markova, Anastasiya, Zhang, Jiaqi, Tate, Isabel
Electromyography signals can be used as training data by machine learning models to classify various gestures. We seek to produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience while comparing the effect of our feature extraction results on model accuracy to other more conventional methods such as the use of AR parameters on a sliding window across the channels of a signal. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting where EMG classification is being conducted, as opposed to more complicated methods such as the use of the Fourier Transform. To augment our limited training data, we used a standard technique, known as jitter, where random noise is added to each observation in a channel wise manner. Once all datasets were produced using the above methods, we performed a grid search with Random Forest and XGBoost to ultimately create a high accuracy model. For human computer interface purposes, high accuracy classification of EMG signals is of particular importance to their functioning and given the difficulty and cost of amassing any sort of biomedical data in a high volume, it is valuable to have techniques that can work with a low amount of high-quality samples with less expensive feature extraction methods that can reliably be carried out in an online application.
- Health & Medicine > Therapeutic Area > Neurology (0.35)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.60)
- Information Technology > Data Science > Data Mining > Feature Extraction (0.58)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.52)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Classification of the Chess Endgame problem using Logistic Regression, Decision Trees, and Neural Networks
In this study we worked on the classification of the Chess Endgame problem using different algorithms like logistic regression, decision trees and neural networks. Our experiments indicates that the Neural Networks provides the best accuracy (85%) then the decision trees (79%). We did these experiments using Microsoft Azure Machine Learning as a case-study on using Visual Programming in classification. Our experiments demonstrates that this tool is powerful and save a lot of time, also it could be improved with more features that increase the usability and reduce the learning curve. We also developed an application for dataset visualization using a new programming language called Ring, our experiments demonstrates that this language have simple design like Python while integrates RAD tools like Visual Basic which is good for GUI development in the open-source world
- North America > Canada > Alberta (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.04)
- Leisure & Entertainment > Games > Chess (0.95)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
- Government > Regional Government > North America Government > United States Government > FDA (0.46)
Network Module Detection from Multi-Modal Node Features with a Greedy Decision Forest for Actionable Explainable AI
Pfeifer, Bastian, Saranti, Anna, Holzinger, Andreas
Network-based algorithms are used in most domains of research and industry in a wide variety of applications and are of great practical use. In this work, we demonstrate subnetwork detection based on multi-modal node features using a new Greedy Decision Forest for better interpretability. The latter will be a crucial factor in retaining experts and gaining their trust in such algorithms in the future. To demonstrate a concrete application example, we focus in this paper on bioinformatics and systems biology with a special focus on biomedicine. However, our methodological approach is applicable in many other domains as well. Systems biology serves as a very good example of a field in which statistical data-driven machine learning enables the analysis of large amounts of multi-modal biomedical data. This is important to reach the future goal of precision medicine, where the complexity of patients is modeled on a system level to best tailor medical decisions, health practices and therapies to the individual patient. Our glass-box approach could help to uncover disease-causing network modules from multi-omics data to better understand diseases such as cancer.
- North America > United States > New York (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Modeling Text with Decision Forests using Categorical-Set Splits
Guillame-Bert, Mathieu, Bruch, Sebastian, Mitrichev, Petr, Mikheev, Petr, Pfeifer, Jan
Decision forest algorithms model data by learning a binary tree structure recursively where every node splits the feature space into two regions, sending examples into the left or right branches. This "decision" is the result of the evaluation of a condition. For example, a node may split input data by applying a threshold to a numerical feature value. Such decisions are learned using (often greedy) algorithms that attempt to optimize a local loss function. Crucially, whether an algorithm exists to find and evaluate splits for a feature type (e.g., text) determines whether a decision forest algorithm can model that feature type at all. In this work, we set out to devise such an algorithm for textual features, thereby equipping decision forests with the ability to directly model text without the need for feature transformation. Our algorithm is efficient during training and the resulting splits are fast to evaluate with our extension of the QuickScorer inference algorithm. Experiments on benchmark text classification datasets demonstrate the utility and effectiveness of our proposal.
- North America > United States (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Research Report (0.64)
- Overview (0.46)
Learning Representations for Axis-Aligned Decision Forests through Input Perturbation
Bruch, Sebastian, Pfeifer, Jan, Guillame-bert, Mathieu
Axis-aligned decision forests have long been the leading class of machine learning algorithms for modeling tabular data. In many applications of machine learning such as learning-to-rank, decision forests deliver remarkable performance. They also possess other coveted characteristics such as interpretability. Despite their widespread use and rich history, decision forests to date fail to consume raw structured data such as text, or learn effective representations for them, a factor behind the success of deep neural networks in recent years. While there exist methods that construct smoothed decision forests to achieve representation learning, the resulting models are decision forests in name only: They are no longer axis-aligned, use stochastic decisions, or are not interpretable. Furthermore, none of the existing methods are appropriate for problems that require a Transfer Learning treatment. In this work, we present a novel but intuitive proposal to achieve representation learning for decision forests without imposing new restrictions or necessitating structural changes. Our model is simply a decision forest, possibly trained using any forest learning algorithm, atop a deep neural network. By approximating the gradients of the decision forest through input perturbation, a purely analytical procedure, the decision forest directs the neural network to learn or fine-tune representations. Our framework has the advantage that it is applicable to any arbitrary decision forest and that it allows the use of arbitrary deep neural networks for representation learning. We demonstrate the feasibility and effectiveness of our proposal through experiments on synthetic and benchmark classification datasets.